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A MEASURE OF DISPERSION FOR ORDERED SERIES* 
By W. L. Cbum, WilMamstown, Massachusetts 



It is the object of this paper to call attention to the inadequacy of 
the standard deviation for the study of the dispersion of a statistical 
series the terms of which are ordered relative to a given variable, to 
examine certain considerations bearing upon the dispersion in such 
series, and to set up tentatively a new measure particularly applicable 
to ordered series. 

I. EXISTING MEASURES 

A statistical series may be assigned to one of two broad classes 
according as it consists merely of a list of numbers of indefinite arrange- 
ment, or has its items ordered relative to a particular variable. Typi- 
cal of the first class are the series composed of experimental measure- 
ments, and chief illustrations of the second class are to be found in 
historical series in the field of economic statistics. The common sta- 
tistical coefficients have been developed in the study of, and are pecu- 
liarly applicable to, series of the first class; and it is a question of con- 
siderable moment whether such coefficients are equally useful in the 
analysis of an ordered series of the second class. 

The chief measure of dispersion is the standard deviation, the square 
root of the mean squared deviation from the arithmetic mean. It is 
apparent from the method of calculation of the standard deviation 
that it can take no account of the arrangement of the variates. 

Example (i). Consider the two series 

(a) 3746258734735826 

(b) 2323455677878634 
obtained by arranging in two different orders the items 

(c) 223334455667778 8. 

The standard deviation for both (a) and (b) is the same as for (c), 
namely, 1.94. Nevertheless, inspection of the series (a) and (b), or a 
glance at their representation in Figure 1, is sufficient to reveal the 
striking difference between the two: (a) fluctuates violently, whereas 
(b) advances with fairly stable sweep. This coefficient fails, therefore, 
to distinguish the erratic series (a) from the fairly smooth (b). 

* Read before the American Mathematical Society on September 9, 1921, under the title "A tentative 
substitute for the standard deviation in the examination of the dispersion of an ordered statistical series." 
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Similar objections may be raised to the other common measures of 
dispersion. The coefficient of variation is derived from the standard 
deviation and has the same shortcoming. The computation of the 
mean deviation is independent of the order of arrangement of the 
variates, and hence it can give no better result than the standard 
deviation. The quartile deviation involves a grouping of the variates 
according to their size, regardless of the actual arrangement of the 
variates in their original form. 

The conclusion seems warranted that existing measures of dispersion 
are not fitted to distinguish between such series as those of example (i). 
These coefficients attach the same importance to a given difference 
between two variates which are widely separated and two which are 
adjacent. They indicate the extent of the fluctuation, but fail to 
account for its rate. For certain purposes in the study of ordered 
series, particularly historical economic series, it would seem to be this 
very rate which it is of importance to examine and measure. Since 
the term dispersion is so widely understood in its present sense and is 
so generally accepted as the quality measured by the standard devia- 
tion, it may be well to avoid speaking of this rate of fluctuation as 
dispersion. We adopt provisionally the term fluctuation-rate. 



II. CORRELATION COEFFICIENT 



We seek now to measure the fluctuation-rate of an order series. It 
is suggested elsewhere* that for an ordered series possessing rectilinear 
trend, a more reliable measure of fluctuation than the ordinary stand- 
ard deviation, <r x , is the generalized standard deviation 



t = vJ^T- 



" xt 



since this latter in effect eliminates that part of the fluctuation due 
to the trend. Now the quantity r xt in the above formula is the coef- 
ficient of correlation between the Xi — the series being examined — and 

* Crum, W. L., The Significance of the Partial Correlation Coefficient in the Comparison of Ordered 
Statistical Series Possessing Rectilinear Trends. (Quarterly Publications of the American Statistical 
Association, Vol. XVII, pp. 949-952.) 
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the t t — the variable relative to which the x { are ordered. We examine 
in detail below the sufficiency of r rf as a measure of fluctuation-rate, 
and any objections found will hold also against a xi . 

It would seem at first glance that the correlation coefficient r^ 
might serve as a measure of fluctuation-rate: the smaller its value, the 
nearer a straight line is the join of the tops of the ordinates representing 
the variates, and the less is the fluctuation. It is at once necessary, 
however, to inquire whether a series of variates does not admit of 
several arrangements, differing markedly in rate of fluctuation, but all 
having the same value for the coefficient of correlation of the x { relative 
to the tf. 

In fact, if there are N variates, 

_ Z(f t -T)(Xj-X) 

' xt 

Nff x ff t 
where T and X are the arithmetic means and a t and <r x are the standard 
deviations of the t t and x rf respectively. It is evident that for a given 
group of x,-, N, a t , and a x will all be fixed constants, regardless of the 
order of arrangement of the x,- relative to the t t . The value of r^ for 
the various arrangements will therefore depend upon the product sum 

Np = 2(t i -T)(x i -X). 

Moreover, it is easily shown that this value of Np differs from 

A = S^x,- 

only by the constant, NXT. Therefore, it is sufficient to examine the 
various values of A which correspond to the different arrangements in 
order to test the adequacy of r& as a measure of fluctuation-rate. 

The question now before us is: Are there several arrangements of a 
given group of x t - relative to the t it such that for all these arrangements 
A is identical? If it could be shown that there is only one arrangement 
corresponding to a given value of A, r^ would indeed be an ideal index 
of fluctuation-rate. Or, if it develops that all the possible arrange- 
ments giving a particular value of A are such that they do not differ 
from one another in the character sought to be measured — the fluctua- 
tion-rate — r^ would be entirely adequate to the purpose. Unfortu- 
nately, it proves that r^ falls short of meeting even this second require- 
ment. 

Without attempting at present an analytical study of the above 
question, we can by experiment with numerical examples bring out the 
essential fact about the failure of r^ to discriminate properly between 
the various arrangements. 
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Example (ii). Suppose the £,• consist of the intergers 1, 2, 3, to be 
arranged arbitrarily relative to U — l, 2, 3. The possible arrangements 
are: 

(a) (b) (c) (d) (e) (f) 

U 

1 

2 

3 
the value of A being given at the head of each series. 

Series (b) and (d) each have A 13, and they are clearly similar in their 
fluctuation-rates. This is evident from the series themselves, or from 
Figure 2. Indeed, the nature of the fluctuation may be inferred from 
the two dotted triangles (this aspect of the question will be viewed more 
fully below), and the one triangle is obviously obtainable from the 
other by a succession of reflections and inversions. Similar remarks 
apply to the two series (c) and (e), for which A is 11. 
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On the other hand, although A is 13 for (b) and 11 for (c), these two 
series have identical fluctuation-rates; for (c) is merely (b) in reverse 
order. The same is true of series (d) and (e), and the objection is even 
clearer in the case of (a) and (f) : these pairs each have a single fluctuation- 
rate, but the value of A differs for the two members of the pair. It 
is apparent, then, that A does not have the same value for all arrange- 
ments having the same fluctuation-rate; but this is not the real ques- 
tion, which is whether all arrangements having the same A have equal 
fluctuation-rates. 

Example (iii). Let the x t be the integers from 1 to 4. The 24 
possible arrangements can be classified in the following frequency 
table according to the value of A in the several cases: 

A 20 21 22 23 24 25 26 27 28 29 30 

freq. 1314222413 1. 

It is found, on inspection, that the series having A 20 and 30 are re- 
verses of each other, and similarly those having A 21 and 29 are re- 
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verses in pairs, and so on throughout. Hence, the objection found 
under example (ii) arises here also, and in even more striking degree. 
We examine next in detail the four series having A 27. They are: 
(a) 1,3, 4, 2; (b) 1, 4, 2, 3; (c) 2, 3, 1, 4; (d) 3, 1, 2, 4 

and their diagrams appear in Figure 3. The series themselves and 
the diagrams show that (b) and (c) constitute a pair having equal 
fluctuation-rates, and so do (a) and (d) ; but the two pairs differ dis- 
tinctly from each other. Of course, since they are the reverses of the 
above four series, the four series having A 23 are also grouped in pairs. 
Moreover, if we test the three series having A 29, it will be found that 
only two of them have identical fluctuation-rates. 
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It appears, therefore, that even for series of the simple sort given 
in this last example, the value of A — and hence of r xt — does not fur- 
nish an adequate measure of the fluctuation-rate: it varies when the 
fluctuation-rate remains unchanged, and changes when the fluctuation- 
rate is constant. As the series become more complicated we shall have 
diminishing confidence in the sufficiency of the correlation coefficient 
for this measurement. We must hesitate even to say that it is surely 
better than the standard deviation; for whereas the latter gives one 
value for all 24 cases in problem (iii), the correlation coefficient does 
indeed make a distinction; but we have seen that it makes a distinction 
where none exists in the fluctuation-rate. 



III. THE MEAN SQUARED SECOND ORDER DIFFERENCE 

We recall that one way of recognizing the equality of the fluctuation- 
rates of two series was by the geometric correspondence of the triangles 
joining the tops of the ordinates. This was most apparent in the 
simplest case, shown in example (ii) and Figure 2. The area of the 
dotted triangle suggests itself as a measure of the fluctuation-rate. 
In a series of more than three variates there would be a series of these 
triangles, each belonging to three adjacent variates: for N variates 
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there would be N— 2 triangles. At any point of the ordered series the 
fluctuation-rate might be measured by the area of the triangle belonging 
to the three nearest variates; and, for the whole series, the fluctuation- 
rate would be indicated by an average of the N— 2 triangular areas. 

If we assume for simplicity that the t { are integers, the area of the 
triangle belonging to three successive variates a;,-_ 1 , x it x i+ i, is 

Area = \ (£,-_! + z,- +1 ) — x { . 
For series (b) and (d) of example (ii) we get for the triangular area 
—3/2 and 3/2 respectively; and these are equal if the sign is neglected. 
This is in accord with our inferences from the series and the figures. 

In averaging the N— 2 triangles in a series of N x it we may neglect 
the signs and take the simple arithmetic mean, or we may take the 
square root of the mean squared area. We choose the latter, and have 
for the average area: 



^N-2 2 \ 2 7 



The type expression in this summation is the square of 

^(Xi-i+X i+1 )-Xi 

which is 

This is precisely J the second order finite difference, A",-. Hence the 
average triangular area is one-half of 



F=\J- 

T 7 



N—l 

2 A"\- 



N-2 2 

and, if we discard the |, we may take F as the new coefficient to meas- 
ure the fluctuation-rate, viz., the square root of the mean squared 
second order difference. 

We arrive at F as a measure of the fluctuation-rate by study of the 
deviation triangles, but the present definition in terms of the second 
differences gives added reason for accepting it. In fact, the first order 
finite difference A',- is concerned with the slope of the join of two suc- 
cessive ordinates, whereas the second order difference A"< has to do with 
the curvature of the join of three successive ordinates. It is this 
curvature, averaged over the entire extent of the series, that it is sought 
to measure. 

It is clear that the actual calculation of F is very simple, and indeed 
scarcely more complicated than that of <r. Although it is necessary to 
calculate both the first and second differences, it is not necessary to 
make any correction for the position of the mean. As an example, we 
calculate F for the two series of example (i). 
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Case (a) 

x { 3 74 625 8734 735 826 

A'i 4-32-433-1-413-423-64 
A"< -7 5 -6 7 -4 -3 5 2 -7 6 1 -9 10 

F=5.85 

Case (b) 

Xi 2 323455677 87 8 634 

A'< 1-11110 110 1-11-2-31 

A",- -2 2 0-1 1 0-11-2 2-3-1 4 

F = 1.81 

Evidently F distinguishes readily the erratic series (a) from the fairly 
regular (b). 

We turn now to those series of example (iii) for which A is 27. The 
values of F for the four series are: 

(a) 2.24; (b) 4.12; (c) 4.12; (d) 2.24. 

We recall from Figure 3 that (c) is the "reflection" of (b), and has the 
same fluctuation-rate. This fact, which was not, to be sure, out of 
accord with the single value 27 for A, is borne out by this new coef- 
ficient. The same is true of (a) and (d). On the other hand, the dis- 
tinction between the two pairs, which was not indicated by any change 
in A, is clearly shown by the differing value of F. Moreover, a cal- 
culation of the values of F for the four series having A 23 shows the 
same values as above, and the series in the A 23 group which are the 
reverses of (a) and (d) in the A 27 group have also F equal to 2.24, and 
similarly for the other pair. Similar tests apply throughout, and we 
may, therefore, say that both the objections raised against A are met 
hyF. 

Although we are not yet in a position to say with finality that F 
furnishes the best measure of the fluctuation-rate in an ordered sta- 
tistical series, the tests which we have made above seem to indicate 
that it may be accepted tentatively as such measure. It is hoped that 
further study of the coefficient F will serve to develop more fully the 
real significance of the square root of the mean squared second order 
difference, and perhaps result in devising another measure even more 
efficacious than F. 



